Word Segmentation in Sanskrit Using Path Constrained Random Walks

نویسندگان

  • Amrith Krishna
  • Bishal Santra
  • Pavankumar Satuluri
  • Sasi Prasanth Bandaru
  • Bhumi Faldu
  • Yajuvendra Singh
  • Pawan Goyal
چکیده

In Sanskrit, the phonemes at the word boundaries undergo changes to form new phonemes through a process called as sandhi. A fused sentence can be segmented into multiple possible segmentations. We propose a word segmentation approach that predicts the most semantically valid segmentation for a given sentence. We treat the problem as a query expansion problem and use the path-constrained random walks framework to predict the correct segments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields

There is rich knowledge encoded in online web data. For example, punctuation and entity tags in Wikipedia data define some word boundaries in a sentence. In this paper we adopt partial-label learning with conditional random fields to make use of this valuable knowledge for semi-supervised Chinese word segmentation. The basic idea of partial-label learning is to optimize a cost function that mar...

متن کامل

Automatic Sanskrit Segmentizer Using Finite State Transducers

In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 ...

متن کامل

Design of a lean interface for Sanskrit corpus annotation

We describe an innovative computer interface designed for assisting annotators in the efficient selection of segmentation solutions for proper tagging of Sanskrit corpus. The proposed solution uses a compact representation of the shared forest of all segmentations. The main idea is to represent the union of all segmentations, abstracting on the sandhi rules used, and aligning on the input sente...

متن کامل

SPARSE: Seed Point Auto‐Generation for Random Walks Segmentation Enhancement in medical inhomogeneous targets delineation of morphological MR and CT images

In medical image processing, robust segmentation of inhomogeneous targets is a challenging problem. Because of the complexity and diversity in medical images, the commonly used semiautomatic segmentation algorithms usually fail in the segmentation of inhomogeneous objects. In this study, we propose a novel algorithm imbedded with a seed point autogeneration for random walks segmentation enhance...

متن کامل

Dual Cavity Segmentation of Left and Right Ventricles in Cardiac MRI by Guided Random Walks with Registration

In this paper we propose a new method for accurate segmentation of the left and right ventricles simultaneously in cardiac magnetic resonance images. Our approach is based on guided random walks and registration in order to efficiently exploit the prior shape knowledge. The contribution of the proposed method is in using registration of the pre-segmented data and then guided random walks segmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016